-
Notifications
You must be signed in to change notification settings - Fork 12
Remove some randomization in the example and in the code base. #344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove some randomization in the example and in the code base. #344
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #344 +/- ##
==========================================
- Coverage 98.08% 98.07% -0.01%
==========================================
Files 22 22
Lines 1147 1144 -3
==========================================
- Hits 1125 1122 -3
Misses 22 22 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
bthirion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR LGTM overall.
jpaillard
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't reproduce the reproducibility issue you describe for CFI and knockoffs. Can you point more precisely to the problematic example/test?
The problem is with The difference is quite minor but there is always some variations. |
jpaillard
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the details. I could reproduce the issue.
It seems that the problem comes from using nested parallel loops: parallel calls of try_reproducibility.run_joblib, which has inner parallelization of plot_knockoff_aggregation.single_run
The inner processes might unpredictably inherit some state of the parent.
To fix it you can use Parallel(n_jobs=<nb_of_jobs>, require='sharedmem') or simply set n_jobs=1 in try_reproducibility.
|
I fixed the issue for plot_knockoff_aggregation. |
|
For |
|
There is still a bit of uncontrolled randomness for plot_knockoff_aggregation but it will be easier to debug it after reformatting with the new API. |
|
I updated the management of the seed because I forgot that it's better to use a range of values for setting a seed than to use a random generator. The problem is because the random generator can generate 2 times the same numbers. |
jpaillard
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
I have one last comment.
bthirion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some very minor comments.
There is no need to make KFold random in the examples (Better use ShuffleSplit if we want a random splitter, but this is the user's choice then).
Co-authored-by: bthirion <[email protected]>
Co-authored-by: bthirion <[email protected]>
Co-authored-by: bthirion <[email protected]>
|
Last review before merging. |
jpaillard
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with the changes suggested in #360 we should avoid showcasing the use of RandomState in examples and replace them with np.random.default_rng
Co-authored-by: Joseph Paillard <[email protected]>
jpaillard
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thx
I try to improve the reproducibility of the example by setting seeds in the example and better management of the random generator in the code base.
However, there is still some randomisation in PermutationFeatureImportance and in Model_X_Knockoff.